As we’ve seen throughout this project, the intervention of COVID-19 in a great deal of these time series has been an unavoidable presence. In the case of more traditional model fitting methods, the drop in 2020 has proven to cause struggles when attempting to holistically understand the data and produce convincing forecasts. Therefore, this page will plot time series data with this interruption taken into account, comparing real and predicted data from after the interruption with counterfactuals from predictions if COVID-19 had not made such a great impact.
Considering we are mainly interested in the impact of COVID-19 on public transit ridership, this section will contain interrupted time series (ITS) analysis on our original national ridership data. Additionally, to obtain conclusions on what causes certain trends within the data, we will look at the ridership data from the nine cities identified in the Data Visualization section, chosen based on the greatest volume of users in 2025.
Note: Much of the code and process on this page is re-purposed from:
Gamage, P. (2026). Applied Time Series for Data Science.
Code
library (tidyverse)
library (ggplot2)
library (forecast)
library (astsa)
library (xts)
library (tseries)
library (fpp2)
library (fma)
library (lubridate)
library (tidyverse)
library (TSstudio)
library (quantmod)
library (tidyquant)
library (plotly)
library (dplyr)
Code
main <- read_csv ("../data/ridership.csv" )
df <- read_csv ("../data/monthly_data.csv" )
df <- df[, c ("Month" ,
"UPT - New York--Jersey City--Newark, NY--NJ" ,
"UPT - Los Angeles--Long Beach--Anaheim, CA" ,
"UPT - Chicago, IL--IN" ,
"UPT - Washington--Arlington, DC--VA--MD" ,
"UPT - San Francisco--Oakland, CA" ,
"UPT - Philadelphia, PA--NJ--DE--MD" ,
"UPT - Boston, MA--NH" ,
"UPT - Seattle--Tacoma, WA" ,
"UPT - Miami--Fort Lauderdale, FL" )]
df$ Month <- as.Date (df$ Month, format = "%m/%d/%y" )
df <- df[order (df$ Month),]
head (df)
Visualizing the Interrupted Data
The variables that I will use as interrupted time series are the national public transit ridership dataset, as well as monthly public transit ridership for the nine American cities with the most ridership. All of these variables were greatly affected by COVID-19 which caused an unprecedented dip in ridership nationwide, making them good candidates for ITS analysis.
Code
main$ Date <- as.Date (as.yearqtr (main$ ` Quarter-Year ` , format = "%Y - Q%q" ))
main_plot <- plot_ly (main, x = ~ Date) %>%
add_lines (y = ~ ` Total Ridership (000s) ` , name = "National Ridership" , line = list (color = 'navy' )) %>%
add_segments (x = as.Date ("2020-03-01" ), xend = as.Date ("2020-03-01" ), y = 0 , yend = 2700000 , line = list (color = 'red' ), name = "Interruption" ) %>%
layout (
title = "National Quarterly Public Transit Ridership" ,
xaxis = list (title = "Date" ),
yaxis = list (title = "Ridership (Total Unlinked Passenger Trips - 000s)" ),
legend = list (title = list (text = "Legend" ))
)
main_plot
Code
plot <- plot_ly (df, x = ~ Month) %>%
add_lines (y = ~ ` UPT - New York--Jersey City--Newark, NY--NJ ` , name = "New York, NY" , line = list (color = 'darkgreen' )) %>%
add_lines (y = ~ ` UPT - Los Angeles--Long Beach--Anaheim, CA ` , name = "Los Angeles, CA" , line = list (color = 'orange' )) %>%
add_lines (y = ~ ` UPT - Chicago, IL--IN ` , name = "Chicago, IL" , line = list (color = '#cccc00' )) %>%
add_lines (y = ~ ` UPT - Washington--Arlington, DC--VA--MD ` , name = "Washington, DC" , line = list (color = 'green' )) %>%
add_lines (y = ~ ` UPT - San Francisco--Oakland, CA ` , name = "San Francisco" , line = list (color = 'blue' )) %>%
add_lines (y = ~ ` UPT - Philadelphia, PA--NJ--DE--MD ` , name = "Philadelphia, PA" , line = list (color = 'purple' )) %>%
add_lines (y = ~ ` UPT - Boston, MA--NH ` , name = "Boston, MA" , line = list (color = 'violet' )) %>%
add_lines (y = ~ ` UPT - Seattle--Tacoma, WA ` , name = "Seattle, WA" , line = list (color = 'brown' )) %>%
add_lines (y = ~ ` UPT - Miami--Fort Lauderdale, FL ` , name = "Miami, FL" , line = list (color = 'cyan' )) %>%
add_segments (x = as.Date ("2020-03-01" ), xend = as.Date ("2020-03-01" ), y = 0 , yend = 420000000 , line = list (color = 'red' ), name = "Interruption" ) %>%
layout (
title = "Public Transit Monthly Ridership by City" ,
xaxis = list (title = "Date" ),
yaxis = list (title = "Ridership (Total Unlinked Passenger Trips)" ),
legend = list (title = list (text = "Legend" ))
)
plot
As we can see, all of these cities’ ridership data experience a great deal of interruption specifically in March of 2020. This will serve as our benchmark upon which interrupted time series methods will be applied.
Fitting and Plotting Interrupted Time Series Models
To prepare this data for ITS analysis, the following variables will be considered:
Y: Public transit ridership
Xt: Sequential time, starting at 1
Zt: Intervention dummy; 0 if prior to COVID-19, 1 if after
Pt: Time since intervention; starting at 1 for the first observation after COVID-19
The visualizations will include the following data:
Observed Data: The actual recorded ridership data
Predicted (Pre-COVID): Linear fit of pre-COVID data
Predicted (Post-COVID): Linear fit of post-COVID data
Linear Counterfactual: Continuation of the pre-COVID linear fit
ARIMA Counterfactual: ARIMA forecast fit on pre-COVID data
Code
library (kableExtra)
library (knitr)
intervention_date <- as.Date ("2020-03-01" )
df$ Xt = seq_along (df$ Month)
df$ Zt = ifelse (df$ Month < intervention_date, 0 , 1 )
df$ Pt = 0
df$ Pt[df$ Month >= intervention_date] <- seq_len (sum (df$ Month >= intervention_date))
intervention_index <- which (df$ Month == intervention_date)
tail (df)
Code
main$ Xt = seq_along (main$ Date)
main$ Zt = ifelse (main$ Date < intervention_date, 0 , 1 )
main$ Pt = 0
main$ Pt[main$ Date >= intervention_date] <- seq_len (sum (main$ Date >= intervention_date))
intervention_index <- which (main$ Date == intervention_date)
tail (main)
Code
library (broom)
library (knitr)
library (kableExtra)
regTS <- lm (` Total Ridership (000s) ` ~ Xt + Zt + Pt, data = main)
summary_table <- tidy (regTS)
summary_table <- summary_table %>%
mutate (
stars = case_when (
p.value < 0.001 ~ "***" ,
p.value < 0.01 ~ "**" ,
p.value < 0.05 ~ "*" ,
p.value < 0.1 ~ "." ,
TRUE ~ ""
),
` Estimate (SE) ` = paste0 (round (estimate, 2 ), stars, " \n (" , round (std.error, 2 ), ")" )
)
summary_table %>%
kable (align = "l" , escape = FALSE ) %>%
kable_styling (full_width = FALSE )
(Intercept)
2073141.047
23515.9618
88.15889
0
***
2073141.05*** (23515.96)
Xt
5663.267
358.0742
15.81590
0
***
5663.27*** (358.07)
Zt
-1969264.124
69122.0464
-28.48967
0
***
-1969264.12*** (69122.05)
Pt
72448.938
6743.0564
10.74423
0
***
72448.94*** (6743.06)
Code
main_ts <- ts (main$ ` Total Ridership (000s) ` [1 : 112 ], start= c (1992 ,1 ), frequency = 4 )
auto.arima (main_ts)
Code
main_fit <- Arima (main_ts, order = c (1 ,1 ,0 ), seasonal = list (order = c (0 ,1 ,2 ), period = 4 ))
main_pred <- forecast (main_fit, 17 )
main_pred <- data.frame (main_pred)
Code
main$ Y_hat <- predict (regTS, newdata = main)
datanew <- data.frame (
Xt = main$ Xt,
Zt = 0 ,
Pt = 0
)
main$ Y_counterfactual <- predict (regTS, newdata = datanew)
main_fig <- plot_ly ()
main_fig <- main_fig %>%
add_lines (x = ~ main$ Xt, y = ~ main$ ` Total Ridership (000s) ` ,
name = "Observed Ridership" ,
line = list (color = '#cccccc' ))
# Predicted line before and after COVID
main_fig <- main_fig %>%
add_lines (x = ~ main$ Xt[1 : 112 ], y = ~ main$ Y_hat[1 : 112 ],
name = "Predicted (Pre-COVID)" ,
line = list (color = '#355464' )) %>%
add_lines (x = ~ main$ Xt[113 : nrow (main)], y = ~ main$ Y_hat[113 : nrow (main)],
name = "Predicted (Post-COVID)" ,
line = list (color = '#68A2B9' ))
main_fig <- main_fig %>%
add_lines (x = ~ main$ Xt[113 : nrow (main)], y = ~ main$ Y_counterfactual[113 : nrow (main)],
name = "Linear Counterfactual (No COVID)" ,
line = list (color = '#68A2B9' , dash = 'dash' ))
main_fig <- main_fig %>%
add_lines (x = ~ main$ Xt[113 : nrow (main)], y = ~ main_pred$ Point.Forecast,
name = "ARIMA Counterfactual (No COVID)" ,
line = list (color = '#99D9D9' ))
main_fig <- main_fig %>%
add_lines (x = c (112 , 112 ), y = c (min (main$ ` Total Ridership (000s) ` ), max (main$ ` Total Ridership (000s) ` )),
name = "COVID-19 Intervention" ,
line = list (color = '#E9072B' , dash = 'dot' ),
showlegend = TRUE )
main_fig <- main_fig %>%
layout (title = "National Ridership: Predicted vs. Counterfactual (COVID-19)" ,
xaxis = list (title = "Time (quarters)" ),
yaxis = list (title = "Total Passenger Trips" ),
legend = list (x = 0.1 , y = 0.1 ))
main_fig
Code
nyc_regTS <- lm (` UPT - New York--Jersey City--Newark, NY--NJ ` ~ Xt + Zt + Pt, data = df)
nyc_summary_table <- tidy (nyc_regTS)
nyc_summary_table <- nyc_summary_table %>%
mutate (
stars = case_when (
p.value < 0.001 ~ "***" ,
p.value < 0.01 ~ "**" ,
p.value < 0.05 ~ "*" ,
p.value < 0.1 ~ "." ,
TRUE ~ ""
),
` Estimate (SE) ` = paste0 (round (estimate, 2 ), stars, " \n (" , round (std.error, 2 ), ")" )
)
nyc_summary_table %>%
kable (align = "l" , escape = FALSE ) %>%
kable_styling (full_width = FALSE )
(Intercept)
313943504.7
2922324.48
107.42938
0
***
313943504.69*** (2922324.48)
Xt
245181.5
23138.82
10.59611
0
***
245181.51*** (23138.82)
Zt
-249853411.5
6326317.53
-39.49429
0
***
-249853411.51*** (6326317.53)
Pt
3213165.1
161932.68
19.84260
0
***
3213165.12*** (161932.68)
Code
nyc_ts <- ts (df$ ` UPT - New York--Jersey City--Newark, NY--NJ ` [1 : 218 ], start= c (2002 ,1 ), frequency = 12 )
auto.arima (nyc_ts)
Code
nyc_fit <- Arima (nyc_ts, order = c (1 ,0 ,3 ), seasonal = list (order = c (0 ,1 ,1 ), period = 12 ), include.drift = TRUE )
nyc_pred <- forecast (nyc_fit, 60 )
nyc_pred <- data.frame (nyc_pred)
Code
df$ nyc_Y_hat <- predict (nyc_regTS, newdata = df)
datanew <- data.frame (
Xt = df$ Xt,
Zt = 0 ,
Pt = 0
)
df$ nyc_Y_counterfactual <- predict (nyc_regTS, newdata = datanew)
nyc_fig <- plot_ly ()
nyc_fig <- nyc_fig %>%
add_lines (x = ~ df$ Xt, y = ~ df$ ` UPT - New York--Jersey City--Newark, NY--NJ ` ,
name = "Observed Ridership" ,
line = list (color = '#cccccc' ))
# Predicted line before and after COVID
nyc_fig <- nyc_fig %>%
add_lines (x = ~ df$ Xt[1 : 218 ], y = ~ df$ nyc_Y_hat[1 : 218 ],
name = "Predicted (Pre-COVID)" ,
line = list (color = '#355464' )) %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ nyc_Y_hat[219 : nrow (df)],
name = "Predicted (Post-COVID)" ,
line = list (color = '#68A2B9' ))
nyc_fig <- nyc_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ nyc_Y_counterfactual[219 : nrow (df)],
name = "Linear Counterfactual (No COVID)" ,
line = list (color = '#68A2B9' , dash = 'dash' ))
nyc_fig <- nyc_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ nyc_pred$ Point.Forecast,
name = "ARIMA Counterfactual (No COVID)" ,
line = list (color = '#99D9D9' ))
nyc_fig <- nyc_fig %>%
add_lines (x = c (218 , 218 ), y = c (min (df$ ` UPT - New York--Jersey City--Newark, NY--NJ ` ), max (df$ ` UPT - New York--Jersey City--Newark, NY--NJ ` )),
name = "COVID-19 Intervention" ,
line = list (color = '#E9072B' , dash = 'dot' ),
showlegend = TRUE )
nyc_fig <- nyc_fig %>%
layout (title = "New York, NY Ridership: Predicted vs. Counterfactual (COVID-19)" ,
xaxis = list (title = "Time (months)" ),
yaxis = list (title = "Total Passenger Trips" ),
legend = list (x = 0.1 , y = 0.1 ))
nyc_fig
Code
la_regTS <- lm (` UPT - Los Angeles--Long Beach--Anaheim, CA ` ~ Xt + Zt + Pt, data = df)
la_summary_table <- tidy (la_regTS)
la_summary_table <- la_summary_table %>%
mutate (
stars = case_when (
p.value < 0.001 ~ "***" ,
p.value < 0.01 ~ "**" ,
p.value < 0.05 ~ "*" ,
p.value < 0.1 ~ "." ,
TRUE ~ ""
),
` Estimate (SE) ` = paste0 (round (estimate, 2 ), stars, " \n (" , round (std.error, 2 ), ")" )
)
la_summary_table %>%
kable (align = "l" , escape = FALSE ) %>%
kable_styling (full_width = FALSE )
(Intercept)
57202713.10
648032.096
88.271420
0
***
57202713.1*** (648032.1)
Xt
-36238.43
5131.087
-7.062525
0
***
-36238.43*** (5131.09)
Zt
-29188387.32
1402875.292
-20.806117
0
***
-29188387.32*** (1402875.29)
Pt
359782.98
35908.941
10.019315
0
***
359782.98*** (35908.94)
Code
la_ts <- ts (df$ ` UPT - Los Angeles--Long Beach--Anaheim, CA ` [1 : 218 ], start= c (2002 ,1 ), frequency = 12 )
auto.arima (la_ts)
Code
la_fit <- Arima (la_ts, order = c (1 ,1 ,0 ), seasonal = list (order = c (2 ,0 ,0 ), period = 12 ), include.drift = TRUE )
la_pred <- forecast (la_fit, 60 )
la_pred <- data.frame (la_pred)
Code
df$ la_Y_hat <- predict (la_regTS, newdata = df)
df$ la_Y_counterfactual <- predict (la_regTS, newdata = datanew)
la_fig <- plot_ly ()
la_fig <- la_fig %>%
add_lines (x = ~ df$ Xt, y = ~ df$ ` UPT - Los Angeles--Long Beach--Anaheim, CA ` ,
name = "Observed Ridership" ,
line = list (color = '#cccccc' ))
# Predicted line before and after COVID
la_fig <- la_fig %>%
add_lines (x = ~ df$ Xt[1 : 218 ], y = ~ df$ la_Y_hat[1 : 218 ],
name = "Predicted (Pre-COVID)" ,
line = list (color = '#355464' )) %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ la_Y_hat[219 : nrow (df)],
name = "Predicted (Post-COVID)" ,
line = list (color = '#68A2B9' ))
la_fig <- la_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ la_Y_counterfactual[219 : nrow (df)],
name = "Linear Counterfactual (No COVID)" ,
line = list (color = '#68A2B9' , dash = 'dash' ))
la_fig <- la_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ la_pred$ Point.Forecast,
name = "ARIMA Counterfactual (No COVID)" ,
line = list (color = '#99D9D9' ))
la_fig <- la_fig %>%
add_lines (x = c (218 , 218 ), y = c (min (df$ ` UPT - Los Angeles--Long Beach--Anaheim, CA ` ), max (df$ ` UPT - Los Angeles--Long Beach--Anaheim, CA ` )),
name = "COVID-19 Intervention" ,
line = list (color = '#E9072B' , dash = 'dot' ),
showlegend = TRUE )
la_fig <- la_fig %>%
layout (title = "Los Angeles, CA Ridership: Predicted vs. Counterfactual (COVID-19)" ,
xaxis = list (title = "Time (months)" ),
yaxis = list (title = "Total Passenger Trips" ),
legend = list (x = 0.1 , y = 0.1 ))
la_fig
Code
chicago_regTS <- lm (` UPT - Chicago, IL--IN ` ~ Xt + Zt + Pt, data = df)
chicago_summary_table <- tidy (chicago_regTS)
chicago_summary_table <- chicago_summary_table %>%
mutate (
stars = case_when (
p.value < 0.001 ~ "***" ,
p.value < 0.01 ~ "**" ,
p.value < 0.05 ~ "*" ,
p.value < 0.1 ~ "." ,
TRUE ~ ""
),
` Estimate (SE) ` = paste0 (round (estimate, 2 ), stars, " \n (" , round (std.error, 2 ), ")" )
)
chicago_summary_table %>%
kable (align = "l" , escape = FALSE ) %>%
kable_styling (full_width = FALSE )
(Intercept)
51414621.715
514408.823
99.9489500
0.0000000
***
51414621.72*** (514408.82)
Xt
-2825.619
4073.064
-0.6937331
0.4884373
-2825.62 (4073.06)
Zt
-37282566.604
1113604.453
-33.4791824
0.0000000
***
-37282566.6*** (1113604.45)
Pt
331459.708
28504.569
11.6283008
0.0000000
***
331459.71*** (28504.57)
Code
chicago_ts <- ts (df$ ` UPT - Chicago, IL--IN ` [1 : 218 ], start= c (2002 ,1 ), frequency = 12 )
auto.arima (chicago_ts)
Code
chicago_fit <- Arima (chicago_ts, order = c (2 ,1 ,1 ), seasonal = list (order = c (2 ,1 ,2 ), period = 12 ))
chicago_pred <- forecast (chicago_fit, 60 )
chicago_pred <- data.frame (chicago_pred)
Code
df$ chicago_Y_hat <- predict (chicago_regTS, newdata = df)
df$ chicago_Y_counterfactual <- predict (chicago_regTS, newdata = datanew)
chicago_fig <- plot_ly ()
chicago_fig <- chicago_fig %>%
add_lines (x = ~ df$ Xt, y = ~ df$ ` UPT - Chicago, IL--IN ` ,
name = "Observed Ridership" ,
line = list (color = '#cccccc' ))
# Predicted line before and after COVID
chicago_fig <- chicago_fig %>%
add_lines (x = ~ df$ Xt[1 : 218 ], y = ~ df$ chicago_Y_hat[1 : 218 ],
name = "Predicted (Pre-COVID)" ,
line = list (color = '#355464' )) %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ chicago_Y_hat[219 : nrow (df)],
name = "Predicted (Post-COVID)" ,
line = list (color = '#68A2B9' ))
chicago_fig <- chicago_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ chicago_Y_counterfactual[219 : nrow (df)],
name = "Linear Counterfactual (No COVID)" ,
line = list (color = '#68A2B9' , dash = 'dash' ))
chicago_fig <- chicago_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ chicago_pred$ Point.Forecast,
name = "ARIMA Counterfactual (No COVID)" ,
line = list (color = '#99D9D9' ))
chicago_fig <- chicago_fig %>%
add_lines (x = c (218 , 218 ), y = c (min (df$ ` UPT - Chicago, IL--IN ` ), max (df$ ` UPT - Chicago, IL--IN ` )),
name = "COVID-19 Intervention" ,
line = list (color = '#E9072B' , dash = 'dot' ),
showlegend = TRUE )
chicago_fig <- chicago_fig %>%
layout (title = "Chicago, IL Ridership: Predicted vs. Counterfactual (COVID-19)" ,
xaxis = list (title = "Time (months)" ),
yaxis = list (title = "Total Passenger Trips" ),
legend = list (x = 0.1 , y = 0.1 ))
chicago_fig
Code
dc_regTS <- lm (` UPT - Washington--Arlington, DC--VA--MD ` ~ Xt + Zt + Pt, data = df)
dc_summary_table <- tidy (dc_regTS)
dc_summary_table <- dc_summary_table %>%
mutate (
stars = case_when (
p.value < 0.001 ~ "***" ,
p.value < 0.01 ~ "**" ,
p.value < 0.05 ~ "*" ,
p.value < 0.1 ~ "." ,
TRUE ~ ""
),
` Estimate (SE) ` = paste0 (round (estimate, 2 ), stars, " \n (" , round (std.error, 2 ), ")" )
)
dc_summary_table %>%
kable (align = "l" , escape = FALSE ) %>%
kable_styling (full_width = FALSE )
(Intercept)
38396910.189
481581.202
79.730916
0.000000
***
38396910.19*** (481581.2)
Xt
-7548.626
3813.137
-1.979637
0.048745
*
-7548.63* (3813.14)
Zt
-30031001.404
1042538.438
-28.805654
0.000000
***
-30031001.4*** (1042538.44)
Pt
406239.630
26685.516
15.223226
0.000000
***
406239.63*** (26685.52)
Code
dc_ts <- ts (df$ ` UPT - Washington--Arlington, DC--VA--MD ` [1 : 218 ], start= c (2002 ,1 ), frequency = 12 )
auto.arima (dc_ts)
Code
dc_fit <- Arima (dc_ts, order = c (4 ,1 ,1 ), seasonal = list (order = c (2 ,1 ,2 ), period = 12 ))
dc_pred <- forecast (dc_fit, 60 )
dc_pred <- data.frame (dc_pred)
Code
df$ dc_Y_hat <- predict (dc_regTS, newdata = df)
df$ dc_Y_counterfactual <- predict (dc_regTS, newdata = datanew)
dc_fig <- plot_ly ()
dc_fig <- dc_fig %>%
add_lines (x = ~ df$ Xt, y = ~ df$ ` UPT - Washington--Arlington, DC--VA--MD ` ,
name = "Observed Ridership" ,
line = list (color = '#cccccc' ))
# Predicted line before and after COVID
dc_fig <- dc_fig %>%
add_lines (x = ~ df$ Xt[1 : 218 ], y = ~ df$ dc_Y_hat[1 : 218 ],
name = "Predicted (Pre-COVID)" ,
line = list (color = '#355464' )) %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ dc_Y_hat[219 : nrow (df)],
name = "Predicted (Post-COVID)" ,
line = list (color = '#68A2B9' ))
dc_fig <- dc_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ dc_Y_counterfactual[219 : nrow (df)],
name = "Linear Counterfactual (No COVID)" ,
line = list (color = '#68A2B9' , dash = 'dash' ))
dc_fig <- dc_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ dc_pred$ Point.Forecast,
name = "ARIMA Counterfactual (No COVID)" ,
line = list (color = '#99D9D9' ))
dc_fig <- dc_fig %>%
add_lines (x = c (218 , 218 ), y = c (min (df$ ` UPT - Washington--Arlington, DC--VA--MD ` ), max (df$ ` UPT - Washington--Arlington, DC--VA--MD ` )),
name = "COVID-19 Intervention" ,
line = list (color = '#E9072B' , dash = 'dot' ),
showlegend = TRUE )
dc_fig <- dc_fig %>%
layout (title = "Washington, DC Ridership: Predicted vs. Counterfactual (COVID-19)" ,
xaxis = list (title = "Time (months)" ),
yaxis = list (title = "Total Passenger Trips" ),
legend = list (x = 0.1 , y = 0.1 ))
dc_fig
Code
sf_regTS <- lm (` UPT - San Francisco--Oakland, CA ` ~ Xt + Zt + Pt, data = df)
sf_summary_table <- tidy (sf_regTS)
sf_summary_table <- sf_summary_table %>%
mutate (
stars = case_when (
p.value < 0.001 ~ "***" ,
p.value < 0.01 ~ "**" ,
p.value < 0.05 ~ "*" ,
p.value < 0.1 ~ "." ,
TRUE ~ ""
),
` Estimate (SE) ` = paste0 (round (estimate, 2 ), stars, " \n (" , round (std.error, 2 ), ")" )
)
sf_summary_table %>%
kable (align = "l" , escape = FALSE ) %>%
kable_styling (full_width = FALSE )
(Intercept)
35372462.21
333042.663
106.210003
0.0e+00
***
35372462.21*** (333042.66)
Xt
12056.78
2637.016
4.572131
7.3e-06
***
12056.78*** (2637.02)
Zt
-29819551.43
720978.676
-41.359824
0.0e+00
***
-29819551.43*** (720978.68)
Pt
306931.40
18454.656
16.631651
0.0e+00
***
306931.4*** (18454.66)
Code
sf_ts <- ts (df$ ` UPT - San Francisco--Oakland, CA ` [1 : 218 ], start= c (2002 ,1 ), frequency = 12 )
auto.arima (sf_ts)
Code
sf_fit <- Arima (sf_ts, order = c (1 ,0 ,1 ), seasonal = list (order = c (0 ,1 ,1 ), period = 12 ))
sf_pred <- forecast (sf_fit, 60 )
sf_pred <- data.frame (sf_pred)
Code
df$ sf_Y_hat <- predict (sf_regTS, newdata = df)
df$ sf_Y_counterfactual <- predict (sf_regTS, newdata = datanew)
sf_fig <- plot_ly ()
sf_fig <- sf_fig %>%
add_lines (x = ~ df$ Xt, y = ~ df$ ` UPT - San Francisco--Oakland, CA ` ,
name = "Observed Ridership" ,
line = list (color = '#cccccc' ))
# Predicted line before and after COVID
sf_fig <- sf_fig %>%
add_lines (x = ~ df$ Xt[1 : 218 ], y = ~ df$ sf_Y_hat[1 : 218 ],
name = "Predicted (Pre-COVID)" ,
line = list (color = '#355464' )) %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ sf_Y_hat[219 : nrow (df)],
name = "Predicted (Post-COVID)" ,
line = list (color = '#68A2B9' ))
sf_fig <- sf_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ sf_Y_counterfactual[219 : nrow (df)],
name = "Linear Counterfactual (No COVID)" ,
line = list (color = '#68A2B9' , dash = 'dash' ))
sf_fig <- sf_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ sf_pred$ Point.Forecast,
name = "ARIMA Counterfactual (No COVID)" ,
line = list (color = '#99D9D9' ))
sf_fig <- sf_fig %>%
add_lines (x = c (218 , 218 ), y = c (min (df$ ` UPT - San Francisco--Oakland, CA ` ), max (df$ ` UPT - San Francisco--Oakland, CA ` )),
name = "COVID-19 Intervention" ,
line = list (color = '#E9072B' , dash = 'dot' ),
showlegend = TRUE )
sf_fig <- sf_fig %>%
layout (title = "San Francisco, CA Ridership: Predicted vs. Counterfactual (COVID-19)" ,
xaxis = list (title = "Time (months)" ),
yaxis = list (title = "Total Passenger Trips" ),
legend = list (x = 0.1 , y = 0.1 ))
sf_fig
Code
philadelphia_regTS <- lm (` UPT - Philadelphia, PA--NJ--DE--MD ` ~ Xt + Zt + Pt, data = df)
philadelphia_summary_table <- tidy (philadelphia_regTS)
philadelphia_summary_table <- philadelphia_summary_table %>%
mutate (
stars = case_when (
p.value < 0.001 ~ "***" ,
p.value < 0.01 ~ "**" ,
p.value < 0.05 ~ "*" ,
p.value < 0.1 ~ "." ,
TRUE ~ ""
),
` Estimate (SE) ` = paste0 (round (estimate, 2 ), stars, " \n (" , round (std.error, 2 ), ")" )
)
philadelphia_summary_table %>%
kable (align = "l" , escape = FALSE ) %>%
kable_styling (full_width = FALSE )
(Intercept)
28139059.779
392932.973
71.612875
0.000000
***
28139059.78*** (392932.97)
Xt
8390.736
3111.224
2.696924
0.007432
**
8390.74** (3111.22)
Zt
-20829593.163
850630.644
-24.487236
0.000000
***
-20829593.16*** (850630.64)
Pt
193927.532
21773.315
8.906661
0.000000
***
193927.53*** (21773.31)
Code
philadelphia_ts <- ts (df$ ` UPT - Philadelphia, PA--NJ--DE--MD ` [1 : 218 ], start= c (2002 ,1 ), frequency = 12 )
auto.arima (philadelphia_ts)
Code
philadelphia_fit <- Arima (philadelphia_ts, order = c (0 ,1 ,1 ), seasonal = list (order = c (0 ,1 ,1 ), period = 12 ))
philadelphia_pred <- forecast (philadelphia_fit, 60 )
philadelphia_pred <- data.frame (philadelphia_pred)
Code
df$ philadelphia_Y_hat <- predict (philadelphia_regTS, newdata = df)
df$ philadelphia_Y_counterfactual <- predict (philadelphia_regTS, newdata = datanew)
philadelphia_fig <- plot_ly ()
philadelphia_fig <- philadelphia_fig %>%
add_lines (x = ~ df$ Xt, y = ~ df$ ` UPT - Philadelphia, PA--NJ--DE--MD ` ,
name = "Observed Ridership" ,
line = list (color = '#cccccc' ))
# Predicted line before and after COVID
philadelphia_fig <- philadelphia_fig %>%
add_lines (x = ~ df$ Xt[1 : 218 ], y = ~ df$ philadelphia_Y_hat[1 : 218 ],
name = "Predicted (Pre-COVID)" ,
line = list (color = '#355464' )) %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ philadelphia_Y_hat[219 : nrow (df)],
name = "Predicted (Post-COVID)" ,
line = list (color = '#68A2B9' ))
philadelphia_fig <- philadelphia_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ philadelphia_Y_counterfactual[219 : nrow (df)],
name = "Linear Counterfactual (No COVID)" ,
line = list (color = '#68A2B9' , dash = 'dash' ))
philadelphia_fig <- philadelphia_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ philadelphia_pred$ Point.Forecast,
name = "ARIMA Counterfactual (No COVID)" ,
line = list (color = '#99D9D9' ))
philadelphia_fig <- philadelphia_fig %>%
add_lines (x = c (218 , 218 ), y = c (min (df$ ` UPT - Philadelphia, PA--NJ--DE--MD ` ), max (df$ ` UPT - Philadelphia, PA--NJ--DE--MD ` )),
name = "COVID-19 Intervention" ,
line = list (color = '#E9072B' , dash = 'dot' ),
showlegend = TRUE )
philadelphia_fig <- philadelphia_fig %>%
layout (title = "Philadelphia, PA Ridership: Predicted vs. Counterfactual (COVID-19)" ,
xaxis = list (title = "Time (months)" ),
yaxis = list (title = "Total Passenger Trips" ),
legend = list (x = 0.1 , y = 0.1 ))
philadelphia_fig
Code
boston_regTS <- lm (` UPT - Boston, MA--NH ` ~ Xt + Zt + Pt, data = df)
boston_summary_table <- tidy (boston_regTS)
boston_summary_table <- boston_summary_table %>%
mutate (
stars = case_when (
p.value < 0.001 ~ "***" ,
p.value < 0.01 ~ "**" ,
p.value < 0.05 ~ "*" ,
p.value < 0.1 ~ "." ,
TRUE ~ ""
),
` Estimate (SE) ` = paste0 (round (estimate, 2 ), stars, " \n (" , round (std.error, 2 ), ")" )
)
boston_summary_table %>%
kable (align = "l" , escape = FALSE ) %>%
kable_styling (full_width = FALSE )
(Intercept)
31088901.40
328463.601
94.649457
0.00e+00
***
31088901.4*** (328463.6)
Xt
10617.87
2600.759
4.082605
5.85e-05
***
10617.87*** (2600.76)
Zt
-23835185.43
711065.816
-33.520365
0.00e+00
***
-23835185.43*** (711065.82)
Pt
242632.43
18200.919
13.330779
0.00e+00
***
242632.43*** (18200.92)
Code
boston_ts <- ts (df$ ` UPT - Boston, MA--NH ` [1 : 218 ], start= c (2002 ,1 ), frequency = 12 )
auto.arima (boston_ts)
Code
boston_fit <- Arima (boston_ts, order = c (0 ,1 ,1 ), seasonal = list (order = c (0 ,1 ,2 ), period = 12 ))
boston_pred <- forecast (boston_fit, 60 )
boston_pred <- data.frame (boston_pred)
Code
df$ boston_Y_hat <- predict (boston_regTS, newdata = df)
df$ boston_Y_counterfactual <- predict (boston_regTS, newdata = datanew)
boston_fig <- plot_ly ()
boston_fig <- boston_fig %>%
add_lines (x = ~ df$ Xt, y = ~ df$ ` UPT - Boston, MA--NH ` ,
name = "Observed Ridership" ,
line = list (color = '#cccccc' ))
# Predicted line before and after COVID
boston_fig <- boston_fig %>%
add_lines (x = ~ df$ Xt[1 : 218 ], y = ~ df$ boston_Y_hat[1 : 218 ],
name = "Predicted (Pre-COVID)" ,
line = list (color = '#355464' )) %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ boston_Y_hat[219 : nrow (df)],
name = "Predicted (Post-COVID)" ,
line = list (color = '#68A2B9' ))
boston_fig <- boston_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ boston_Y_counterfactual[219 : nrow (df)],
name = "Linear Counterfactual (No COVID)" ,
line = list (color = '#68A2B9' , dash = 'dash' ))
boston_fig <- boston_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ boston_pred$ Point.Forecast,
name = "ARIMA Counterfactual (No COVID)" ,
line = list (color = '#99D9D9' ))
boston_fig <- boston_fig %>%
add_lines (x = c (218 , 218 ), y = c (min (df$ ` UPT - Boston, MA--NH ` ), max (df$ ` UPT - Boston, MA--NH ` )),
name = "COVID-19 Intervention" ,
line = list (color = '#E9072B' , dash = 'dot' ),
showlegend = TRUE )
boston_fig <- boston_fig %>%
layout (title = "Boston, MA Ridership: Predicted vs. Counterfactual (COVID-19)" ,
xaxis = list (title = "Time (months)" ),
yaxis = list (title = "Total Passenger Trips" ),
legend = list (x = 0.1 , y = 0.1 ))
boston_fig
Code
seattle_regTS <- lm (` UPT - Seattle--Tacoma, WA ` ~ Xt + Zt + Pt, data = df)
seattle_summary_table <- tidy (seattle_regTS)
seattle_summary_table <- seattle_summary_table %>%
mutate (
stars = case_when (
p.value < 0.001 ~ "***" ,
p.value < 0.01 ~ "**" ,
p.value < 0.05 ~ "*" ,
p.value < 0.1 ~ "." ,
TRUE ~ ""
),
` Estimate (SE) ` = paste0 (round (estimate, 2 ), stars, " \n (" , round (std.error, 2 ), ")" )
)
seattle_summary_table %>%
kable (align = "l" , escape = FALSE ) %>%
kable_styling (full_width = FALSE )
(Intercept)
12330354.08
146146.581
84.36977
0
***
12330354.08*** (146146.58)
Xt
32764.04
1157.182
28.31366
0
***
32764.04*** (1157.18)
Zt
-13493538.07
316381.594
-42.64957
0
***
-13493538.07*** (316381.59)
Pt
124638.81
8098.316
15.39071
0
***
124638.81*** (8098.32)
Code
seattle_ts <- ts (df$ ` UPT - Seattle--Tacoma, WA ` [1 : 218 ], start= c (2002 ,1 ), frequency = 12 )
auto.arima (seattle_ts)
Code
seattle_fit <- Arima (seattle_ts, order = c (3 ,0 ,0 ), seasonal = list (order = c (0 ,1 ,2 ), period = 12 ), include.drift = TRUE )
seattle_pred <- forecast (seattle_fit, 60 )
seattle_pred <- data.frame (seattle_pred)
Code
df$ seattle_Y_hat <- predict (seattle_regTS, newdata = df)
df$ seattle_Y_counterfactual <- predict (seattle_regTS, newdata = datanew)
seattle_fig <- plot_ly ()
seattle_fig <- seattle_fig %>%
add_lines (x = ~ df$ Xt, y = ~ df$ ` UPT - Seattle--Tacoma, WA ` ,
name = "Observed Ridership" ,
line = list (color = '#cccccc' ))
# Predicted line before and after COVID
seattle_fig <- seattle_fig %>%
add_lines (x = ~ df$ Xt[1 : 218 ], y = ~ df$ seattle_Y_hat[1 : 218 ],
name = "Predicted (Pre-COVID)" ,
line = list (color = '#355464' )) %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ seattle_Y_hat[219 : nrow (df)],
name = "Predicted (Post-COVID)" ,
line = list (color = '#68A2B9' ))
seattle_fig <- seattle_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ seattle_Y_counterfactual[219 : nrow (df)],
name = "Linear Counterfactual (No COVID)" ,
line = list (color = '#68A2B9' , dash = 'dash' ))
seattle_fig <- seattle_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ seattle_pred$ Point.Forecast,
name = "ARIMA Counterfactual (No COVID)" ,
line = list (color = '#99D9D9' ))
seattle_fig <- seattle_fig %>%
add_lines (x = c (218 , 218 ), y = c (min (df$ ` UPT - Seattle--Tacoma, WA ` ), max (df$ ` UPT - Seattle--Tacoma, WA ` )),
name = "COVID-19 Intervention" ,
line = list (color = '#E9072B' , dash = 'dot' ),
showlegend = TRUE )
seattle_fig <- seattle_fig %>%
layout (title = "Seattle, WA Ridership: Predicted vs. Counterfactual (COVID-19)" ,
xaxis = list (title = "Time (months)" ),
yaxis = list (title = "Total Passenger Trips" ),
legend = list (x = 0.1 , y = 0.1 ))
seattle_fig
Code
miami_regTS <- lm (` UPT - Miami--Fort Lauderdale, FL ` ~ Xt + Zt + Pt, data = df)
miami_summary_table <- tidy (miami_regTS)
miami_summary_table <- miami_summary_table %>%
mutate (
stars = case_when (
p.value < 0.001 ~ "***" ,
p.value < 0.01 ~ "**" ,
p.value < 0.05 ~ "*" ,
p.value < 0.1 ~ "." ,
TRUE ~ ""
),
` Estimate (SE) ` = paste0 (round (estimate, 2 ), stars, " \n (" , round (std.error, 2 ), ")" )
)
miami_summary_table %>%
kable (align = "l" , escape = FALSE ) %>%
kable_styling (full_width = FALSE )
(Intercept)
16667041.419
228430.549
72.963277
0e+00
***
16667041.42*** (228430.55)
Xt
-9983.082
1808.702
-5.519473
1e-07
***
-9983.08*** (1808.7)
Zt
-8084986.106
494511.884
-16.349427
0e+00
***
-8084986.11*** (494511.88)
Pt
89171.016
12657.859
7.044715
0e+00
***
89171.02*** (12657.86)
Code
miami_ts <- ts (df$ ` UPT - Miami--Fort Lauderdale, FL ` [1 : 218 ], start= c (2002 ,1 ), frequency = 12 )
auto.arima (miami_ts)
Code
miami_fit <- Arima (miami_ts, order = c (2 ,1 ,0 ), seasonal = list (order = c (2 ,0 ,0 ), period = 12 ))
miami_pred <- forecast (miami_fit, 60 )
miami_pred <- data.frame (miami_pred)
Code
df$ miami_Y_hat <- predict (miami_regTS, newdata = df)
df$ miami_Y_counterfactual <- predict (miami_regTS, newdata = datanew)
miami_fig <- plot_ly ()
miami_fig <- miami_fig %>%
add_lines (x = ~ df$ Xt, y = ~ df$ ` UPT - Miami--Fort Lauderdale, FL ` ,
name = "Observed Ridership" ,
line = list (color = '#cccccc' ))
# Predicted line before and after COVID
miami_fig <- miami_fig %>%
add_lines (x = ~ df$ Xt[1 : 218 ], y = ~ df$ miami_Y_hat[1 : 218 ],
name = "Predicted (Pre-COVID)" ,
line = list (color = '#355464' )) %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ miami_Y_hat[219 : nrow (df)],
name = "Predicted (Post-COVID)" ,
line = list (color = '#68A2B9' ))
miami_fig <- miami_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ df$ miami_Y_counterfactual[219 : nrow (df)],
name = "Linear Counterfactual (No COVID)" ,
line = list (color = '#68A2B9' , dash = 'dash' ))
miami_fig <- miami_fig %>%
add_lines (x = ~ df$ Xt[219 : nrow (df)], y = ~ miami_pred$ Point.Forecast,
name = "ARIMA Counterfactual (No COVID)" ,
line = list (color = '#99D9D9' ))
miami_fig <- miami_fig %>%
add_lines (x = c (218 , 218 ), y = c (min (df$ ` UPT - Miami--Fort Lauderdale, FL ` ), max (df$ ` UPT - Miami--Fort Lauderdale, FL ` )),
name = "COVID-19 Intervention" ,
line = list (color = '#E9072B' , dash = 'dot' ),
showlegend = TRUE )
miami_fig <- miami_fig %>%
layout (title = "Miami, FL Ridership: Predicted vs. Counterfactual (COVID-19)" ,
xaxis = list (title = "Time (months)" ),
yaxis = list (title = "Total Passenger Trips" ),
legend = list (x = 0.1 , y = 0.1 ))
miami_fig
Interpreting Results
The following effects apply to all 10 datasets visualized above:
Immediate Effect: There is a clear immediate drop as COVID-19 begins to take effect, as has been discussed previously.
Sustained Effect: Trends have changed since the intervention, as gradients after COVID-19 are greater all around. This is due to the rebound in ridership which in some cases has been rapid enough to return near prior values.
Counterfactual: Had the intervention not occurred, national ridership would likely have continued to climb gradually. This differs between cities, but none would have experienced the rapid decrease followed by the somewhat sharp rebound if it weren’t for an interruption.
Delayed Effect: Several data points after the interruption, predicted values have still not caught up to counterfactuals, indicating that in 2025 we are still seeing the effects of COVID-19 on public transit ridership.
Looking at each of the individual cities, there are some important distinctions that can be made between metropolitan areas. Firstly, cities like New York and Seattle had clearly increasing ridership prior to the interruption, while cities like Miami and Los Angeles were already decreasing. In the decreasing cases, current values are already almost or exactly at a point where they may have been without the pandemic entirely, considering drops could have continued. Meanwhile, cities with previously healthy ridership trends tend to see a more rapid recovery after the pandemic, indicating that their service has been continued to be sought out by residents. More interpretations will be outlined on the Conclusions page.